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Abstract 

This  paper  shows  that  the  Working  Set  parameter-real  memory 
and  real  memory-fault  rate  anomalies  mentioned  by  Franklin,  Graham, 
and  Gupta  in  [FrGG78]  do  occur  in  traces  generated  by  real  programs. 
The  results  of  the  detailed  investigation  of  this  anomalous  behavior 
in  four  Fortran  programs  are  presented.   In  some  cases,  a  drop  of  a 
factor  of  two  in  the  average  memory  allotment  is  observed  when  the 
window  size  is  increased.   In  some  instances,  a  bigger  memory  allotment 
means  an  order  of  magnitude  increase  in  page  faults. 


Keywords : 

Memory  Management 
Multiprogramming 
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Working  Set  Anomaly 
Program  Behavior 


1.        Introduction 

Franklin,  Graham,  and  Gupta  have  shown  in  [FrGG78]  that  the 
page  fault  frequency  policy  of  memory  management  can  exhibit  anomalous 
behavior  for  some  reference  strings.   They  gave  short  example  reference 
strings  to  illustrate  their  ideas  and  pointed  out  that  real  programs 
exhibit  this  anomalous  behavior  [Grah76] ,  [Gupt74] .   For  the  working  set 
policy,  WS,  they  gave  an  example  reference  string  to  demonstrate  that 
certain  anomalous  behavior  is  also  possible.   However,  nothing  was 
mentioned  about  encountering  these  anomalies  of  the  WS  in  real  programs. 
The  WS  anomalies  were  encountered  experimentally  in  the  spring  of  1978 
by  one  of  us  while  working  on  the  development  of  automatic  program 
transformations  to  improve  program  behavior  in  a  virtual  memory 
environment  [AbuS78],  [AbKL79] ,  [AbKL80]. 

Before  proceeding,  we  will  describe  the  notation  used  in  this 

paper,  and  will  define  the  two  types  of  anomalies  to  be  considered.   The 

WS  policy  keeps  in  memory  the  pages  referenced  during  the  previous  x 

memory  references,  where  T  is  the  WS  control  parameter.   This  set  of 

pages  is  the  working  set  and  is  denoted  by  W(x,t)  at  time  t.   Its  size 

is  w(T,t).   The  average  memory  allocated  to  a  program  during  its  execution 

is  given  by 

R  f(T) 

M(T,L)  =  (  Z  w(T,t)  +  L  •   I   w  (x,t  ))/(R  +  L  •  f  (T))     (1) 
t=l  i=l 


where 


R  =  the  length  of  the  reference  string 

L  =  the  mean  page  fault  service  time 

f(x)  =  the  number  of  page  faults 

w.  (x,t.)  =  the  working-set  size  at  the  ith  page  fault, 


We  say  that  there  is  a  parameter-real  memory  anomaly  when 
M(t-i>L)  >  M(x  ,L)  for  some  T..  <  T„  and  some  L,  and  a  real  memory-fault 
rate  anomaly  when,  for  some  T   ^  T„  and  some  L  ,M(x  ,L)  <  M(t  ,L)  and 
f(T1)  <  f(x2)  [FrGG78]. 

In  the  1978  experiments  mentioned  above,  some  of  the  trans- 
formed programs  and  some  of  the  untransformed  ones  showed  both  anomalies. 
In  those  experiments  only  references  to  array  elements  were  considered. 
The  page  size  was  64  words  and  we  used  three  values  for  L:   32,  320,  and 
3200  references.   Table  1  summarizes  our  findings  for  these  programs, 
and  Table  2  shows  some  of  their  characteristics.   In  Table  1,  we  can 
see  that  the  two  anomalies  defined  above  arise  in  all  the  programs 
listed  for  at  least  one  value  of  L.   Consider,  for  example,  the  program 
BASE  when  L  =  32.   When  T  is  6,  the  average  memory  used  is  2.33  pages 
and  the  number  of  page  faults  is  681.   When  T  is  increased  to  8,  the 
average  memory  decreases  to  1.96  pages  and  the  number  of  page  faults 
decreases  drastically  to  374.   The  decrease  in  average  memory  when  T 
increases  illustrates  the  parameter-real  memory  anomaly,  and  the  decrease 
in  the  number  of  page  faults  when  the  average  memory  decreases, 
illustrates  the  real  memory-fault  rate  anomaly. 

Recently,  Denning  in  [Denn80]  argues  that  it  is  unlikely  that 
any  nonlookahead  policy  better  than  the  WS  will  be  discovered.   Thus,  it 
seems  that  WS  dispatchers  should  be  widely  used  in  future  computer  sys- 
tems.  However,  load  control  of  a  multiprogrammed  system  which  is  based 
on  an  anomalous  memory  management  policy  may  be  unstable  since  a  change 
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of  a  given  sign  in  the  parameter  might  not  produce  changes  of  corre- 
sponding sign  in  the  controlled  variable  [FrGG78],   This  has  convinced 
us  of  the  importance  of  making  a  thorough  investigation1  and  analysis  of 
the  WS  anomalies  in  real  programs  which  is  the  subject  of  this  paper. 

Specifically,  we  will  study  in  detail  four  of  the  untransformed 
programs  used  in  our  previous  experiments,  namely:   BASE,  FOURTR,  INIT, 
and  PAPUAL.   Table  3  shows  some  of  their  characteristics.2   In  our  crude 
previous  investigations,  BASE  and  PAPUAL  did  not  show  any  anomalies  while 
INIT  and  FOURTR  did.   We  start  Section  2  of  this  paper  with  a  brief 
discussion  of  the  trace  generation  and  processing  method.   Then  we 
present  the  results  of  our  experiments  and  their  analysis.   In  Section  3, 
we  make  some  concluding  remarks . 


The  investigation  whose  results  are  displayed  in  Table  1  was  not  thorough. 

The  values  in  Table  3  assume  column  major  storage  for  two-dimensional 
arrays,  whereas  those  in  Table  2  assume  the  square-block  storage  scheme 


2.        The  Results  and  Their  Analysis 
2.1       The  Experimentation  Method 

The  page  size  in  our  experiments  was  256  8-bit  bytes.  The  main 
memory  access  time  was  taken  as  the  time  unit  and  the  average  access  time 
to  secondary  memory  was  defined  as  L  time  units.  The  instruction  set  was 
assumed  to  be  the  IBM  370' s. 

We  assumed  the  use  of  segments  [Denn70],  each  consisting  of  one 
or  more  pages.   For  each  program,  one  segment  was  allocated  to  the  code, 
one  to  each  array,  and  one  to  the  scalar  variables.   Each  variable  was 
assumed  to  be  four  bytes  long,  and  two-dimensional  arrays  to  be  stored  in 
column-major  order. 

The  tool  we  used  to  do  the  experiments  is  sketched  in  Fig.  1. 
It  consists  of  two  components;  a  trace  generator  and  a  simulator,  both 
written  in  PL/I.   The  trace  generator  reads  a  source  Fortran  program, 
and  the  output  from  the  Fortran  G  compiler  for  the  same  source  program. 
The  source  program  is  interpreted  to  generate  the  address  trace.   The 
object  program  is  used  only  to  determine  which  and  how  many  machine 
instructions  are  fetched  for  each  Fortran  statement.   This  approach  to 
trace  generation  has  some  advantages.   It  allows  us  to  generate  subsets 
of  the  addresses  in  the  trace;  for  example,  we  can  generate  array 
addresses  only,  or  instruction  addresses  only.   Also,  the  storage 
strategy  can  be  changed  without  having  to  change  the  compiler;  for 
example,  two-dimensional  arrays  can  be  stored  by  rows,  columns,  or 
square  blocks  [McCo69].   For  performance  reasons,  not  all  assignment 
statements  in  the  source  program  are  interpreted,  only  those  that 
determine  the  address  trace.   For  example,  in  the  program 


DIMENSION  A(IOO),  B(IOO) 

DO   1   I  =  1,  100 
1         A(I)  =  B(I)  +  C 

STOP 

END 
the  assignment  statement  does  not  have  to  be  interpreted;  the  address 
trace  will  be  the  same  whatever  the  value  of  A.   The  information  on 
which  assignment  statements  to  interpret  is  supplied  to  the  trace 
generator  by  the  user. 

The  simulator  of  the  WS  policy  is  fairly  straightforward; 
it  takes  a  range  of  values  and  an  increment  for  T,  and  produces  all 
the  information  required  to  compute  the  average  memory  for  those  values 
of  T.   Since  the  average  working-set  size  at  fault  time  does  not  satisfy 
the  inclusion  property,  we  were  not  able  to  use  a  stack  algorithm 
[MGST70].   This  made  the  simulation  costly  in  terms  of  computer  time. 
In  contrast,  the  trace  generation  was  so  cheap  that  we  decided  not  to 
store  the  trace  in  a  tape  but  to  regenerate  it  each  time  a  new  simulation 
was  done.   The  simulator  was  used  as  a  subroutine  of  the  trace  generator. 

2.2       The  Results 

2.2.1     The  Page  Faults  vs.  Window  Size  Curves 

Fig.  2(a)  shows  the  page  faults  vs.  the  window  size,  x,  curve 
for  the  array  reference  string  of  program  BASE.   Fig.  2(b)  shows  this 
curve  when  references  to  scalar  variables,  constants,  and  instructions 
are  included  in  the  reference  string.   Figs.  3,  4,  and  5  show  similar 
curves  for  the  rest  of  the  programs.   We  notice  that  the  graphs  for 
all  the  programs  share  some  common  characteristics. 

The  general  shape  of  the  page  fault  curves  does  not  change  when 
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references  to  scalar  variables  and  instructions  are  included.   This  is 
obvious  when  comparing  Figs.  2(a)  and  2(b).   Thus  the  paging  behavior 
of  our  programs,  and  numerical  programs  in  general,  is  mainly  controlled 
by  references  to  array  variables  (see  also  Figs.  6-9).   This  same  argument 
was  made  in  [MaBa76] .   This  is  expected  because  the  number  of  array  pages 
referenced  in  numerical  programs  is  much  larger  than  the  number  of  scalar 
and  instruction  pages  (Table  3).   However,  there  is  around  one  order  of 
magnitude  difference  between  the  total  number  of  references  in  a  string 
and  the  references  to  array  elements.   This  explains  the  order  of 
magnitude  shift  along  the  window  size  axis  between  the  curves  in  Figs.  2(a) 
and  2(b). 

The  curves  are  not  smooth  curves.   This  can  simply  be  explained 
by  the  fact  that  numerical  programs  mainly  execute  loops.   Thus,  when 
the  window  size  is  increased  to  the  point  where  all  the  pages  referenced 
in  an  iteration  of  a  loop  are  included  in  the  same  working  set,  a 
sudden  drop  in  the  number  of  page  faults  will  occur. 


2.2.2     The  Average  Memory  Allotment  vs.  Window  Size 
and  the  Page  Faults  vs.  Average  Memory  Curves 

Figs.  6-9  show  the  average  memory  allotment  vs.  window  size 

curves  for  L  =  0  and  L  ->■  °°  .   The  average  memory  allotment,  at  a  given 

T  =  T   ,  can  be  either  an  increasing  or  decreasing  function  of  L.   We  have 

R  f(T  ) 

M(x   L)  =  (  Z  w(T   t)  +  L  •   E  ■   w.(T1,t.))/(R  +  L  •  f (T  )) 
t=l  i-l 

=  (A(Tl)  +  B(T1)  •  L)/(R  +  C(T1)  •  L)  (2) 


Thus, 


3M(T1,L)/3L  =  (B(T1)  •  R  -  C(T1)  •  A(T1)/(R  +  C^)  •  L)2 


The  solid  curves  are  for  L  =  0  and  the  dashed  curves  are  for  L  -*■  °° 
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Therefore,  M(t,,L)  will  always  fall  between  M(t-.  ,0)  =  A(t,)/R  and 
M(t,  ,°°)  =  B(t,)/C(t,  ).   Thus,  the  curves  for  L  =  0  and  L  ■*  °°  are 
envelopes  to  all  other  curves  for  0  <  L  <  °°  .   Fig.  10  shows  M(x,0), 
M(t,20),  M(t,200),  M(t,2000),  and  M(t,«)  for  program  BASE  (array 
references  only).   We  observe  that  the  curves  for  M(t,2000)  and  M(t,°°) 
are  fairly  close  to  each  other.   This  is  also  true  for  all  the  other 
programs.   Thus,  the  M(T,°°)  curve  is  a  good  approximation  to  the 
M(x,L)  curves  when  L  >  2000  . 

Figs.  11-14   show  the  page  faults  vs.  average  memory  allotment 
for  the  programs  (L  =  0  and  L  -*■   °°)  .   From  Figs.  6-14    ,  we  see  that  the 
parameter-real  memory,  and  the  real  memory-fault  rate  anomalies  occur  in 
all  programs.   Table  4  shows  the  points  at  which  the  anomalies  are  most 
significant. 

Fig.  15   shows  a  typical  section  of  the  memory  allotment  curve 

where  an  anomalous  behavior  is  seen.   We  note  that  the  line  M  =  M„  intersects 

the  curve  at  t  =  x  and  M  =  M,  at  T  =  T,  .   If  we  let  T  =  It  j  and 
alb  j     a 

T,  =  [t,  1  ,  then  we  observe  the  following 

(i)   There  will  be  anomalies  between  t,  and  all  t's  in  the 

open  interval  (t, ,t,  ) 

(ii)   There  will  be  anomalies  between  t«  and  all  t's  in  the 

open  interval  (t_,  t~) 

(iii)   When  generating  the  data  for  plotting  the  curve,  if  the 

increment  in  T,  At  ,  was  greater  than  or  equal  to 

T.  -  T„  ,  then  the  anomalous  behavior  in  this  section 
4    J 

of  the  curve  will  not  show  up.   We  remark  that  if  every 
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possible  anomalous  section  of  the  curve  is  to  show, 

then  At  must  be  taken  to  be  1. 

Table  5  shows  x„,  x,  ,  and  Ax  for  the  anomalous  regions  of  the 
average  memory  allotment  curves  for  program  INIT.   We  note  that  in  the 
array  reference  string,  no  anomalies  will  be  discovered  if  X  is  always 
changed  by  increments  greater  than  946  references.   For  the  mixed  string, 
no  anomalies  will  be  discovered  if  the  increments  of  X  are  greater  than 
12375.   We  note  that  for  this  program  the  maximum  anomalous  change  in 
the  average  memory  allotment  is  20.5  pages. 

Program  FOURTR  shows  a  more  complicated  anomalous  behavior.   We 
notice  that  there  are      overlapping  anomalous  regions.   Thus,  for  array 
references,  no  anomalies  will  be  discovered  if  x  is  incremented  by  values 
greater  than  2320  -  60  =  2260  (the  largest  x  minus  the  smallest  x„  of  the 
regions).   For  the  mixed  reference  string,  this  number  is  17205 

Each  one  of  the  eight  traces  considered  in  detail  in  this  paper 
was  studied  for  a  limited  range  of  the  window  size,  as  can  be  seen  in  the 
figures.   In  Table  6,  we  show  the  minimum  space- time5  product  inside  this 
range,  and  the  window  size  where  this  minimum  is  reached.   A  lower  bound 
on  the  space-time  product  for  window  sizes  beyond  the  range  studied  is 
also  shown.   This  lower  bound  was  computed  with  the  following  formula. 

LB(X  ,L)  =  ST(X  ,L)  -  (f(x*)  -  w(R,R))  *  w(R,R)  *  L 
where  x  is  the  largest  value  of  the  window  size  used  in  the  experiments. 


These  numbers  and  those  presented  in  the  rest  of  the  discussion  are 
for  L  ->  °°  . 

When  calculating  the  space-time  cost,  we  use  the  expression 

R  f(T) 

ST(X,L)  =  E  w(t,x)  +  L  •   £   w.(t.,x) 
t=l  i=l 

(footnote  continued  on  page  16) 
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5 (Continuation) 

The  first  sum  in  the  above  equation  is  equal  to  R  •  M(t,0).   The  second 
sum  is  equal  to  f(x)  •  M(x,°°)  .   In  the  literature,  [DKLP76]  ,  [Denn78]  , 
[GrDe77],  the  second  sum  is  sometimes  approximated  by  f(x)  M(t,0).   Thus, 
the  approximate  expression  for  ST  is  given  by 

ST(t,L)  =  R  •  M(t,0)  +  L  •  ^P^-     M(t,0)  •  R 

=  R  •  M(t,0)(1  +  L  •  f  (t)/R)  . 
While  ST(x,L)  can  be  easily  calculated  for  all  T  from  statistics  generated 
after  one  scan  of  a  reference  string,  this  is  not  possible  for  ST(t,L)  . 
Thus,  calculating  ST  is  much  more  expensive  than  ST  .   Graham  reports  that 
this  approximation  can  be  in  error  by  as  much  as  20%  [Grah76].   For  our 
programs,  we  found  that  the  error  can  be  as  high  as  70%.   Figs.  16  through 
19  show  the  relative  error  curves  for  our  programs  ((ST  -  ST)/ST  vs.  x)  . 

When  ST  is  used  to  approximate  ST  ,  then  the  VMIN  algorithm 
[PrFa76]  is  usually  used  as  the  optimal  algorithm  to  minimize  space-time 
product.   The  accurate  algorithm  for  minimizing  space-time  product  was 
developed  in  [BDMS80].   In  [BuDa80] ,  it  is  shown  that  DMIN  outperforms 
VMIN.   They  also  show  that  the  WS  outperforms  VMIN  in  some  instances. 


It  is  easy  to  see  that  for  T  >^  T   ,  ST(x,L)  >  LB(x  ,L)  .   From  the  previous 
information,  we  can  see  that  anomalies  occur  for  window  sizes  both  smaller 
and  larger  than  those  where  the  minimum  space- time  product  occur. 
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3.        Conclusion 

The  results  of  the  experiments  reported  in  this  paper  show  that 
the  anomalous  behavior  of  the  WS  policy  is  not  insignificant.   A  change  in 
the  window  size  of  a  given  sign  can  cause  more  than  200%  change  in  the  average 
real  memory  allotted  to  a  program  in  the  unexpected  direction  and  a  corre- 
sponding change  in  the  page  fault  rate  of  one  order  of  magnitude  (see  Table  4) , 
Thus,  this  anomalous  behavior  of  the  WS  policy  cannot  just  simply  be  ignored. 

In  real  computer  systems,  people  are  interested  in  the  turnaround 
time,  throughput,  and  multiprogramming  degree.   The  turnaround  time  of  a 
job  is  related  to  its  CPU  execution  time  and  paging  rate.   The  window  size 
of  the  WS  policy  controls  the  paging  rate  with  no  problems.   In  other  words, 
the  WS  policy  does  not  have  the  parameter-fault  rate  anomaly.   Thus,  the 
turnaround  time  of  a  job  can  be  safely  controlled  under  the  WS  policy. 

The  throughput  of  a  system,  however,  is  dependent  on  its  multi- 
programming degree  [Denn78],  which  itself  depends  on  the  average  real 
memory  allotted  to  programs  during  their  execution.   Thus,  the  results  in 
this  paper  cast  some  doubts  on  the  reliability  of  the  window  size  as  a 
control  of  the  multiprogramming  degree.   However,  the  limited  scope  of  this 
work  precludes  making  any  final  statement  on  this  subject.   More  experiments, 
including  experiments  with  nonnumerical  programs,  are  required  before  such 
a  final  statement  can  be  made. 
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